1 Load Source File

Data pre-processing is included, where special chars and minimal stop-words are removed

2 Sentiment Analysis by Sentence Using ‘sentimentr’ Package

Get relevant columns

cols <- c('recid', 'item_id', 'user_id', 'text')
reviews2.text <- as.data.frame(reviews2.csv[, cols])

The code block to compute the sentiment score at the sentence level. Not executing it because it takes around 10 minutes to generate the sentiment score for each sentence of a review.

The sentiment score is generated using the sentimentr package.

## get sentiment by sentence using sentimentr package
# reviews_sentences <- reviews2.text %>%
#   get_sentences(text) %>%
#   mutate(sentence_sentiment = sentimentr::sentiment(text)$sentiment)

Load the pre-computed sentiment score by sentence

## read the sentiment analysis result (using sentimentr package)
reviews_sentences <- read.csv('~/Dropbox/Eugenie/data/processed/reviews_sentences.csv')

2.1 Sentiment Analysis with the Mean of Sentences

Get sentiment by reviews by taking the mean of sentences

sentiment_reviews_sentence.mean <- reviews_sentences %>%
  group_by(recid) %>%
  summarize(sentiment_mean = mean(sentence_sentiment),
            sentence_count = n()) %>%
  ungroup()

Join the mean sentiment score with other selected columns

cols <- c('recid','item_id','rating','helpful_yes','helpful_total',
          'image_count','word_count','brand_repeat',
          'incentivized','is_deleted','verified_purchaser')
reviews2.text <- reviews2.csv[,cols]
reviews2.text <- merge(reviews2.text, sentiment_reviews_sentence.mean, by='recid')

2.1.1 Summary stats checks

reviews2.text[, c('incentivized','sentiment_mean')] %>%
  group_by(incentivized) %>%
  summarize_all(mean, na.rm = TRUE)
## # A tibble: 2 x 2
##   incentivized     sentiment_mean
##   <fct>                     <dbl>
## 1 non-incentivized          0.293
## 2 incentivized              0.254

Note: The incentivized sentiment mean is lower, suprisingly. However, the ‘sentimentr’ package is not perfect either.

For example, here is an incentivized review with negative mean sentiment score, yet the content is relatively positive.

knitr::kable(reviews_sentences[reviews_sentences$recid=='100125154',c('text','sentence_sentiment')],
             caption = "An Example of Incentivized Review with Positive Content but Negative Sentiment Score", floating.environment="sidewaystable")
An Example of Incentivized Review with Positive Content but Negative Sentiment Score
text sentence_sentiment
218725 Mpow Mechanical Gaming Keyboard,87 Keys Anti-Ghosting PC Gaming Keyboard with Blue SwitchesUpdate: My dad has been using this keyboard for a while, here is his additional review: Very nice keyboard, nice, great key feel, accurate. 0.1083333
218726 I love the mechanical key response, 4 stars, would have given 5 if the keys light up. -0.0774597
218727 Very nice, HIGHLY recommend!! 0.9000000
218728 My order for the Mpow Mechanical Gaming Keyboard,87 Keys Anti-Ghosting PC Gaming Keyboard with Blue Switches arrived in the mail quickly and timely thanks to Amazon Prime which is well worth the money if you order regularly from Amazon especially for the added free benefits of Amazon music and Amazon Video in addition to the free 2-Day Prime shipping. 0.9308070
218729 Mechanical keyboards have been making a surge in popularity among the tech-literate crowd due to the superior tactile response and feel that a standard keyboard lacks. 0.2116951
218730 For those that spend at least 8 hours a day behind a keyboard, you’d be wise to make an investment in your typing experience. 0.1020621
218731 If you’re looking to make the leap to a mechanical keyboard this year, or maybe just looking to add to your keyboard stable, this a great one!!! 0.2834734
218732 I purchased this keyboard for my dad who is an avid computer gamer. 0.2080126
218733 He really likes that the keyboard is mechanical and it works great!! 0.4763140
218734 He enjoys playing World of Tanks even more with this awesome keyboard. -0.0635085
218735 He really likes that the keyboard is contoured to your hands, its keeps his fingers from getting sore!! -0.0235702
218736 He love the fast responding time of the keys, makes for a better shot and faster tank!!! 0.4244373
218737 Featureso Mechanical Keyboardo Durableo Easy to Useo High QualityWe’ve had this keyboard for a few days now and it's really great for the money. 0.3883099
218738 The keyboard has tactile clicky switches that are mounted on a metal plate that also serves as the top surface of the keyboard, similar to other popular keyboards. 0.2834734
218739 Yes, it makes that satisfying THOCK sound as you type. 0.5692100
218740 The keys appear to be ABS plastic and the legends are printed on top. 0.2672612
218741 They don't appear to have any sort of coating on the keys so the legends may wear down more, but that's not a performance issue. 0.2598076
218742 In the market today, there are many keyboards to choose from, from cheap to real expensive. -0.3375000
218743 This is a win for me, this keyboard is affordable, works great and is very durable.Disclaimer: I was not compensated for this review, however I did receive the product for a discounted price for my honest and true review. 0.9052020
218744 All opinions are my own and not influenced in any way.5 stars – I love this item! 0.1875000
218745 I highly recommend everyone purchase this product!4 stars – I like it, has a few flaws, but I would purchase again.3 stars – The item was just ok, it worked, probably will not buy it again2 stars – There are probably worse products than this, but there are definitely better choices.1 star – I dislike this product, and I wish I had not bought it, I can’t recommend it to others. 0.8528159
218746 I will update my review in the future if I run into any problems with the product or if I find something I think would be of value to the customer. -0.2783882
218747 Thank you for taking the time to read my review, if you found it helpful please give me a “helpful” vote. 0.7855844

As we see, there’s quite a variation in the sentence sentiment in this particular case. Would this be an example for further analysis on the extreme variance within each review?

But it’s also foreseeable that the variation within non-incentivized reviews would be less prounounced since they’re a lot shorter.

2.1.2 Plots

Boxplot: rating vs. sentence_sentiment

2.1.3 Fixed Effect Linear Model

## fix effect linear model
## Use sentence sentiment score to replce rating
formula.fe <- sentiment_mean ~ incentivized + is_deleted + verified_purchaser
model.fe <- plm(data = reviews2.text, formula = formula.fe, index = c('item_id'), model = 'within')
# get the model summary
summary(model.fe)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = formula.fe, data = reviews2.text, model = "within", 
##     index = c("item_id"))
## 
## Unbalanced Panel: n = 101, T = 29-10134, N = 264016
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -3.29888 -0.19515 -0.02265  0.17020  2.69732 
## 
## Coefficients:
##                              Estimate Std. Error t-value  Pr(>|t|)    
## incentivizedincentivized   -0.0172033  0.0076676 -2.2436   0.02486 *  
## is_deleteddeleted           0.0201613  0.0025538  7.8946 2.923e-15 ***
## verified_purchaserverified  0.0190881  0.0025959  7.3531 1.943e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    24635
## Residual Sum of Squares: 24624
## R-Squared:      0.00043385
## Adj. R-Squared: 4.3737e-05
## F-statistic: 38.1826 on 3 and 263912 DF, p-value: < 2.22e-16

2.1.4 Correlation: rating vs. review sentiment

cor.test(reviews2.text$rating, reviews2.text$sentiment_mean, method=c("pearson", "kendall", "spearman"))
## 
##  Pearson's product-moment correlation
## 
## data:  reviews2.text$rating and reviews2.text$sentiment_mean
## t = 297.47, df = 264014, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4981655 0.5038793
## sample estimates:
##       cor 
## 0.5010279

2.2 Sentiment Analysis with the Sum of Sentences (out of Curiosity)

Get sentiment by reviews by taking the sum of sentences

sentiment_reviews_sentence.sum <- reviews_sentences %>%
  group_by(recid) %>%
  summarize(sentiment_sum = sum(sentence_sentiment)) %>%
  ungroup()

Join the mean sentiment score with other selected columns

reviews2.text <- merge(reviews2.text, sentiment_reviews_sentence.sum, by='recid')

2.2.1 Summary stats checks

reviews2.text[, c('incentivized','sentiment_sum')] %>%
  group_by(incentivized) %>%
  summarize_all(mean, na.rm = TRUE)
## # A tibble: 2 x 2
##   incentivized     sentiment_sum
##   <fct>                    <dbl>
## 1 non-incentivized         0.614
## 2 incentivized             2.47

Note: The incentivized review sentiment sum is a lot higher than the non-incentivized group. Going back to the suprising finding in section 2.1.1, combined with our previous finding that incentivized reviews could be a lot longer than non-incentivized ones, the higher sum could be explained.

2.2.2 Plots

Boxplot: rating vs. sentence_sentiment

2.2.3 Fixed Effect Linear Model

## fix effect linear model
## Use sentence sentiment score to replce rating
formula.fe <- sentiment_sum ~ incentivized + is_deleted + verified_purchaser
model.fe <- plm(data = reviews2.text, formula = formula.fe, index = c('item_id'), model = 'within')
# get the model summary
summary(model.fe)
## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = formula.fe, data = reviews2.text, model = "within", 
##     index = c("item_id"))
## 
## Unbalanced Panel: n = 101, T = 29-10134, N = 264016
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -4.66658 -0.41532 -0.06511  0.34090 11.80856 
## 
## Coefficients:
##                              Estimate Std. Error t-value  Pr(>|t|)    
## incentivizedincentivized    1.6215649  0.0175074  92.622 < 2.2e-16 ***
## is_deleteddeleted           0.1170905  0.0058311  20.081 < 2.2e-16 ***
## verified_purchaserverified -0.1185700  0.0059273 -20.004 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    134740
## Residual Sum of Squares: 128380
## R-Squared:      0.047215
## Adj. R-Squared: 0.046843
## F-statistic: 4359.33 on 3 and 263912 DF, p-value: < 2.22e-16

2.2.4 Correlation: rating vs. review sentiment

cor.test(reviews2.text$rating, reviews2.text$sentiment_sum, method=c("pearson", "kendall", "spearman"))
## 
##  Pearson's product-moment correlation
## 
## data:  reviews2.text$rating and reviews2.text$sentiment_sum
## t = 253.22, df = 264014, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4389741 0.4451123
## sample estimates:
##       cor 
## 0.4420484

Although it’s still significant, the mean sentiment score shows a stronger correlation with ratings.